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Abstract 



The£ercentage of students at/above a cut point (PAAC) is one of the most 
common measures used for reporting school-level performance relative to a proficiency 
standard (Cronbach, Bradbum, & Horvitz, 1994). The two purposes of this study were to 
introduce procedures for estimating standard errors for school PAAC’s under a 
generalizability theory model and to examine the influence of different student sampling 
plans on the standard errors. A strong relationship between the standard error for school 
PAAC and the number of students in a school was found. Infinite- and finite-population 
assumptions for students provide somewhat different standard errors when relatively small 
number of students were used for estimating school PAAC’s. 
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The Influence of Student Sampling Plan 
on Standard Error for School PAAC 

Assessing student achievement in terms of proficiency standards that have been set 
on a test is a common practice as a result of the educational reform movement, Title 1 
requirements, and public demands for education accountability (Lewis, Green, Mitzel, 
Baum, & Patz, 1998). School districts, states, and the nation use standards such as ‘basic’, 
‘proficient’, and ‘advanced’ to describe students’ overall level of achievement (Berk, 1986; 
Jaeger, 1989; Kane, 1994). School-level reports as well as student-level reports that 
describe performance relative to such standards have been recommended for assessing 
schools’ progress (Cronbach, Bradbum, & Horvitz, 1994). 

The percentage of students at/above a cut point (PAAC) is one of the most 
common measures used for reporting school-level performance relative to a proficiency 
standard (Cronbach et al., 1994), and it has been recommended that the standard error of 
this PAAC also be reported. For example, the Standards for Educational and Psychological 
Testing (American Educational Research Association, American Psychological 
Association, & National Council on Measurement in Education, 1999) states that when 
average test scores for groups are used, “the standard error of the group mean should be 
reported, as it reflects variability due to sampling of examinees as well as variability due to 
measurement error” (Standard 2.19, p. 36). Such evidence about the uncertainty attached to 
a set of scores is required to avoid over-interpretation of the scores (Cronbach, Linn, 
Brennan, & Haertel, 1 997). 
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The two purposes of this study were to introduce procedures for estimating 
standard errors for school PAAC’s under a generalizability theory model and to examine 
the influence of different student sampling plans on the standard errors for school PAAC’s. 

In the paper, a dummy variable is assumed. This variable can be dichotomously 
coded either 0 or 1 to represent a student’s pass/fail status, and it can be expressed by 



S,= 



0 , 



if X,ZC 
otherwise 



0 ) 



where S, is the status score for student i, X t is the test score for student i, and C represents 
the cutscore. The average of scores on this dummy variable over students can be 
transformed to the PAAC score by 



‘j 

IS, 

PAAC. = — — 

1 h 



xlOO, 



( 2 ) 



where PAAC t is the PAAC score for school j, I j is the number of students who took a test 
in school j. 



Generalizability Theory Approaches to Standard Errors 

The univariate p:(sxf) generalizability study (G-study) design involving persons 
( p ) nested within schools (s) and test forms (J) was used to estimate variance components 
in the current study. The linear model for the response of a person within a school and a 
form treats schools as objects of measurement and persons and forms as random facets. 
The linear model can be represented as: 
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X psf = P + Ms~ + Mf~ + M s f ~ + t 1 p \sf,e ~ * ^ 

The terms of right-hand side are grand mean, school effect, form effect, school by form 
interaction effect, and person within school and form effect confounded with unexplained 
sources of error, respectively. 

A decision study (D-study) is conducted for the purpose of determining the most 
efficient measurement procedures and/or estimating reliability coefficient and standard 
error of measurement. The analyst should decide which universe is of great interest. That 
is, the universe of generalization is one of the most important D-study considerations in 
applying generalizability theory into practice. In the current study, three types of possible 
universes of generalization that have different student sampling plans are considered, and 
associated formulas of estimating standard errors for school PAAC’s are provided. 

Let n p denote the sample size for students and N p denote the population 

size for students. If an investigator is interested in making inferences about school 
PAAC’s to a infinite student population beyond students recently taught, it is 
appropriate to use the infinite universe definition for students. This student 
sampling plan is denoted as Sampling Plan 1 (SP1) in this study, and it requires the 

assumption that n p < N p ' » . That is, this investigator assumes that students tested 

in a school are simply a sample from an infinite universe of students. The standard 
error for the PAAC for this situation is estimated by 



SE(SPl) = 1 00 x 



ct 2 (/) & 2 (sf) & 2 (p:sf) 

7 1 7 1 7 7 

i n f n f n p n / 



( 4 ) 
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where the estimates of variance components from a G-study are: 



<j 2 (/)= the estimate of variance of forms; 



a 2 (5/) = the estimate of variance for interactions of schools and forms; 



a\p : sf) = the estimate of variance for students nested within school by form; 
and n' f , and ri p represent number of forms and number of students per form within 
a school, respectively. 

Another decision-maker simply wants to draw conclusions about a particular 
school performance in a particular year and tests a sample of students. These students can 
be considered a sample from a finite population. This is called Sampling Plan 2 (SP2) in 

this study with a specification of n p <N p <00 . This investigator’s universe of 
generalization is “restricted.” It is concerned only with a finite universe, but this does not 
mean that this investigator’s universe is worse than the universe of previous investigator. 
The two investigators merely have different conceptualizations about the universe of 
generalization. The standard error for the SP2 is estimated by 



The meanings of variance components and sample sizes are the same as defined in 
Equation 4. 

A student facet is considered fixed when the sample of students tested serves as the 
population of students to which the test results are generalized. In this case, the relation of 

n p =Np < 00 is assumed. Sampling Plan 3 (SP3) is used here to describe this situation. The 



SE(SP 2) = 100x 




( 5 ) 
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analyst wants to make inferences about school performance only in a specific year and 
with regard to a specific group of students. The test results are used to describe the only 
students who are participated in the testing program. Thus, the universe of generalization 
for SP3 is more “restricted” compared to those of SP1 and SP2. The standard error for 
school PAAC under this specification is 



where other notations are the same as defined in Equations 4 and 5. 

A comparison among three mathematical formulae for estimating standard 
errors described above is useful in understanding relationship between the sampling 
plan and a standard error estimate for a school PAAC. The most important 



shown in Equation 5. 

Because the correction factor is less than 1 , the Equation 5 produces smaller 



factor in Equation 5 will be 1 and Equation 5 should be the same as Equation 4. In 
contrast, because Equation 6 does not include the variance component term of 
“persons within schools by forms,” this produces the smallest standard errors 




( 6 ) 




standard errors than does Equation 4. If N p is infinite like SP1, the correction 
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among three. In the SP3, because n p is equal to N p , the correction factor in 
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Equation 5 will be 0. Consequently, the last term in Equation 5 would disappear 
and Equation 5 turns out to be the same as Equation 6. 

From these relations, we can anticipate that the relationship among the 
standard error estimates for three sampling plans will be SE (SP1) > SE (SP2) > SE 
(SP3). This inequality is logical and to be expected because a sampling plan with a 
broader universe of generalization produces a larger standard error. 

Method 

Data Sources 

The tests used in this study were the Mathematics tests for grades 4 and 8 from a 
statewide assessment. There were three test forms for each grade, and each test form was 
composed of 80 multiple-choice (MC) items and 3 or 4 constructed-response (CR) items. 
The test measured student’s mathematics computation and application skills. The three 
forms were randomly assigned to students within a school by following the spiraling 
procedures to make randomly equivalent groups. More than 25,000 students took each test 
form within a grade. Student sample size and general characteristics of each test form are 
presented in Table 1. 

Insert Table 1 About Here 
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Analyses 

Two item response models were used for scaling; the three-parameter logistic 
model (Lord & Novick, 1968; Lord, 1980) was used to scale the MC items and the two- 
parameter partial credit model (Muraki, 1992; Yen, 1993) was used to scale the CR items. 
The item parameters were estimated using the PARDUX computer application program 
(Burket, 1996). 

A cut score was set at a scale score of 475 in grade 4, which corresponded to the 
40 th percentile of score distribution. In grade 8, the cut score was set at a scale score of 
461, which corresponded to the 45 th percentile. Cut scores near the 40 th percentile were 
chosen as realistic example of cut scores that might be set. With a cut score for each grade, 
students were classified into dichotomous pass/fail categories and coded 1 or 0, 
respectively. The percentage of students in a school was computed using the formula 
expressed in Equation 2. 

The analyses for generalizability study were conducted to estimate variance 
components. Because the number of students for each school and each form varied, the 
conditions for a balanced design were not met. ANOVA-like procedures were used with 
urGENOVA computer application program (Brennan, 1999) for estimating variance 
components for an unbalanced design. Using variance component estimates from a 
generalizability study, standard errors for school PAAC’s were estimated in several D- 
studies with varying number of students from 10 to 200 and varying number of forms from 
1 to 6. 
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Results and Discussion 

G-Study 

Variance component estimates for the random effects p : (sx /) G-study design are 
presented in Table 2. The variance components in a G-study represent the observed score 
variance for a single student in a single school on a single test form. The school variance is 
an estimate of the variance of schools’ mean scores over students and test forms. Table 2 
also shows that the percentages of variance component associated with schools were 8.1% 
for grade 4 and 7.4% for grade 8. Form variance represents the variation of form mean 
scores over all schools and students, and the percentages were small, 0.0% for grade 4 and 
0.2% for grade 8. The magnitude of the school by form interaction variance component 
shows the degree to which the rank orderings of schools varied across forms. These 
interactions were also small, 1 .4% for grade 4 and 0.0% for grade 8. The largest variance 

component was the ‘students nested within schools and forms’, a 2 (p : sf) . Because this 
variance component includes variance components due to unexplained sources of error, it 
was not surprising that it is relatively large. 

Insert Table 2 About Here 



D-Study 

Figure 1 shows the standard error estimates resulting from the use of the three 
student sampling plans for 20 student samples whose sizes were ranged from 10, 20, 30, 
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...» 200. For computing standard errors for school PAAC’s in the SP2, the student 
population size was set to 200 for convenience. 

Insert Figure 1 About Here 



In both grades, the plots show that the standard errors for the SP1 were consistently 
greater than the standard errors for the SP2, regardless of student sample size. This finding 
is predictable given the relations among standard error formulae explained in the previous 
section. The differences in the standard errors for the two sampling plans increased as the 
number of students within school increased. 

Also in both grades the standard errors for the SP3 consistently were lower than 
those associated with the SP1 and SP2. The results for the SP3 did not vary with student 
sample sizes. This occurred because in the SP3 the student sample was fixed and referred 
to the whole student population of interest. 

The SP2 produced standard errors for school PAAC’s that appeared between the 
values produced by SP1 and SP3. Therefore, the standard errors for the SP1 and SP3 can 
be considered upper- and lower-bound for the SP2 standard errors. 

The difference between standard errors from the SP 1 and SP3 can be regarded as 
defining the range for possible values for the SP2 standard errors. For example, if student 
sample size of 10 was used in grade 4, the SP2 could produce SE’s between 3.2% and 
8.8%. The range is 5.6%. In contrast, if student sample size of 100 was used, the standard 
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errors for the SP2 would have values between 3.2% and 4.1%, and the range would be less 
than 1%. 

These results and the plots also show that three different student sampling plans 
produced somewhat different SE’s for the relatively small student samples (e.g., less than 
50). However, they did not make meaningful differences on standard errors if sufficiently 
large students (e.g., greater than 100) were used for estimating school PAAC. 

The SP2 standard errors estimated using different student sample sizes and 
population sizes are given in Figure 2. In both grades, student sample size had notable 
effects on the size of the standard errors for school PAAC’s. For the population of 50, 
increasing the sample size from 10 students to 20 students produced a decrease in the 
standard errors of about 2.5%. Increasing the sample size from 20 to 30 produced a 
decrease of about 1.1%. Further reductions in the SE’s were obtained by further increasing 
the sample size, although the rate of reduction slowed down. The positive effects of 
increasing the sample size were similar across populations that ranged in size from 10 to 
300. Also, it is useful to note that the effects of student sample size mitigated the effects of 
student population size. That is, as the student sample size increased, the effects of student 
population size on the standard errors for school PAAC’s diminished. 

Insert Figure 2 About Here 



Figure 2 also shows that the effects of student population size on the standard 
errors for school PAAC’s were small after the student population size was reasonably large 
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(e.g., greater than 150) for a given student sample size. In this situation, the SP1 method 
could be an alternative to the SP2 method in estimating standard errors. The SP1 is a 
simple method compared to the SP2 because it does not depend upon student population 
size. 

To investigate the form effects on the standard errors for school PAAC’s, several 
D-studies were completed under the SP1 and SP3 specifications. The analyses for the SP2 
were not performed because they are so complex since the use of majority of combinations 
with student sample and population sizes were required as its inputs. However, because 
standard errors for the SP2 are between the SP1 and SP3, we can predict the range of the 
SP2 standard errors. The form effects on standard errors for school PAAC’s are presented 
in Figure 3. 

Insert Figure 3 About Here 



In this figure, the total number of students sampled was set to a certain number, 120 
per school. That is, if two forms were used, 60 students were assumed taking the first form 
and another 60 students were assumed taking the second form. If three forms were 
involved, each of three groups of 40 students was assumed to take each of three forms. 
Consequently, the total number of students sampled for a school remained constant 
regardless of the number of forms. After controlling the total number of students per 
school used for estimating school PAAC’s, we can still observe non-negligible form 
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effects. Fitzpatrick, Lee, and Gao (in press) and Yen (1997) reported similar form effects 
in their papers. 



Conclusions 

One of the primary purposes of current state assessment programs is to measure 
progress to performance standards at the aggregate level. Some states may do census- 
testing, but others may sample students for testing. The effect of student sampling on 
aggregate-level performance measures should be a critical issue in making inferences from 
these measures. Three student sampling plans were investigated in this study in the context 
of estimating standard errors for school PAAC’s. Based upon the results, the following 
generalizations can be offered: 

First, the standard errors for school PAAC’s depend primarily upon the number of 
students in a school who take each test form within the school. Thus, standard errors for 
school PAAC’s should be reported in relation to student sample size. 

Second, the different assumptions in student sampling plans provide different 
standard errors for school PAAC’s. When relatively fewer students (less than 50) are 
sampled for estimating school PAAC’s, three different assumptions will lead to SE 
estimates that are different. However, if sufficiently large students are sampled (greater 
than 100), they will provide similar standard errors. 

Third, the effects of student sampling from the finite population are clear for the 
small sample of students. In this case, student population size should be considered. 
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However, if student population size is reasonably large, infinite-population method can be 
an alternative for the finite-population method. 

Fourth, form effects are evident. Using two forms instead of one form can reduce 
standard errors for school PAAC’s by non-negligible amounts. Controlling form effects 
could be considered a practical way to obtain targeted standard error for school PAAC. 
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TABLE 1 

Scale Score Descriptive Statistics for Student Performance 
On Three Test Forms In a Grade 



Grade 


Form 


No. of Items 




Scale Score 


Multiple 

Choice 


Constructed 

Response 


No. of 
Students 


Mean 


Standard 

Deviation 


4 


A 


80 


3 


28,821 


492 


89 




B 


80 


4 


28,103 


491 


88 




C 


80 


3 


27,543 


490 


97 


8 


A 


80 


3 


26,935 


464 


112 




B 


80 


4 


26,404 


462 


112 




C 


80 


3 


25,844 


464 


110 
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TABLE 2 

Variance Component Estimates for the Random Effects p:(sx f) Generalizability Study Design 
With Unequal Number of Students ( p ) Per School (s) and Form (/) 



Variance Component 


Estimate 


Percentage of Variance Component 


t\s) 


Grade 4 Mathematics 
0.01800 


8.1 


t\f) 


0.00004 


0.0 


v\sf) 


0.00303 


1.4 


<J 2 (p:sf,e) 


0.20087 


90.5 


& 2 (s) 


Grade 8 Mathematics 
0.01480 


7.4 


t\f) 


0.00044 


0.2 




0.00002 


0.0 


& 2 (p:sf ,e) 


0.18556 


92.4 
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Stand rd Error of School PAAC Standard Error of School PAAC 



Grade 4 Mathematics 




Grade 8 Mathematics 




Figure 1 . The effects of student sample size on standard errors for 
school PAAC’s for three student sampling plans. 



Standard Error for School PAAC Score Standard Error for School PAAC Score 



Grade 4 Mathematics 




Grade 8 Mathematics 




Figure 2. The student population size effect on standard errors for school PAAC's for the 
student sampling from the finite population 
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Grade 4 Mathematics 




Grade 8 Mathematics 




Figure 3. The form effects on the standard errors for school PAAC's given the same total number of 
student sample per school. 
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